Note: This is a special text-only version of the QuickStart Guide. All pictures and their references have been removed. This QuickStart Guide introduces you to Data Desk and how it works. It is not meant to be a comprehensive manual but rather a general introduction to working with and navigating through the program. Although you can learn about Data Desk by simply reading through this guide, we hope that you will take time to try out the sample analysis yourself and begin to see data analysis through brand new eyes. This guide is divided into short sections, each discussing a single action, method, or result. You can stop at the end of any section and continue later. Each step of the analysis is set out as an instruction like this: * Turn on your Computer. The data analysis example in this guide starts with basic graphics and simple statistics. We gradually introduce more sophisticated methods as the analysis unfolds. The example analyzes the Companies datafile supplied on your disk. 1 Launch Data Desk Double-click the Data Desk program icon, or select it and choose Open from the File menu. Data Desk's first screen has four buttons offering the most common initial actions: Open an existing file, Paste data from Clipboard, Enter data from keyboard, and Help. We will work with an existing Data Desk file, but let's consider the menus first to give you an idea of where to find commands. 2 Examine the Menus A program's menus tell you much about its capabilities and organization. Even with the initial display on the screen, you may examine the menus and select commands. Data Desk menus are organized according to their function. File Open, close, and save datafiles, import and export text data, and print the contents of any window. Edit General-purpose copying and pasting of almost anything, setting file preferences. Data Create, open, close, and duplicate Data Desk variables, folders, and other icons. Special Empty the trash, find objects, create slide shows, and control the selector and group buttons. Modify Modify almost any aspect of a plot to make plots more effective data analysis tools. Manip Generate random or patterned variables, sort, rank, transform, and otherwise manipulate variables. Calc Compute statistics and perform analyses. Plot Make graphs. Data Desk has many capabilities, but you need only a few to get started. Data Desk uses submenus to keep more complex commands out of your way until you need them. Menu commands that have an arrow on the right (>) hold a submenu. Pause at that menu item with the mouse button down to let the submenu drop down next to your mouse and then drag sideways to enter the submenu. Submenu commands typically are lists of related items such as fonts or windows, or commands that work together. For example, the New submenu in the Data menu lists the kinds of icons you can make. Try pulling the submenu down now. Hold the mouse button down and slide to the right to enter the submenu. The phrase "the Blank Variable command in the New submenu of the Data menu" is tedious, so this Guide uses the shorthand notation {Data > New} Blank Variable to name the menu (Data), submenu (New), and command (Blank Variable). This specifies how to find a command by showing the path to it. 3 Entering and Editing Data To create a variable, choose {Data > New} Blank Variable. Data Desk displays a dialog requesting a name for the variable. Type a name for the variable, and click the OK button. You can rename the variable at any time. Data Desk appends the icon of the new variable to the frontmost relation window and opens it to show an editing window. The new blank variable has as many cases as are in its relation, but each case is blank. If there is no open relation window, Data Desk creates a new relation that has no cases, names it Data, puts its icon in the Data folder in the File Cabinet, and puts the new variable's icon in that relation. To enter data, type values one row at a time, ending each row by pressing the Return key. You can make the window wider by dragging the size box to the right, or make it longer by dragging the size box downward or by clicking the zoom box. As you type, you replace the old blank cases or, if you are at the bottom of the variable window, you append new cases to the relation. When you are done typing, click the close box in the upper left corner of the window or choose {Data} Close. The editing window closes into the variable's icon. 4 Editing a Variable Text editing works in essentially the same way as in other programs. Whenever a text editing window is frontmost, anything typed is either inserted in the window at the vertical blinking text insertion point or replaces text that is selected. The Backspace key deletes the character before the insertion point or the entire highlighted selection. To alter a single value in a variable, edit it with the standard methods. Click between two characters in the text of a case to place a text insertion point between them, or drag across characters to select them. Type or paste to insert text at the blinking cursor or to replace the selected text. Data Desk extends these conventions to data cases. Thus, to insert a case between two other cases, click between the cases to place a horizontal blinking case insertion point between those cases. Type or paste text to start a new case at that point. You can drag up or down across several cases to select them. If you drag off the top or bottom of a window, it scrolls automatically and selects cases as they become visible. Type or paste text to replace all the selected cases and begin inserting new cases at that point. Press Return to begin a new case. You can tell whether a click will place a vertical or horizontal insertion point by the orientation of the mouse cursor. A vertical "I-beam" places the vertical text insertion point, and a horizontal "cross-beam" places a horizontal case insertion point. As you move the mouse up and down along the cases in a variable, the cursor alternates between these two shapes. 5 Open a Datafile * Open the datafile Companies. Choose the {File} Open Datafile... command. Data Desk offers a list of available files. Use the dialog to select the Companies file and click the Open button. Data Desk opens the file and shows several icons. Each of these icons (except for Reference) is a variable holding a column of data on one aspect of each of 79 companies selected from the Forbes 500 list of 1986. For example, the icon named Company holds the names of the companies, and the icon named Assets holds the assets of each company in millions of dollars. The icons behave just like operating system icons. You can drag them to another location, rename them in place, or open them to see their contents. To select an icon, touch it with the tip of the mouse cursor arrow and click. To select several icons, click the first and Shift-click the others in turn, or drag a selection rectangle around adjacent icons. When you look for patterns in data you usually will be more interested in relationships among the concepts that these icons represent than in the data values themselves. The icons thus help you think about data analysis without being overwhelmed by all the numbers. 6 Scroll and Select Data Desk keeps each variable in its own icon. You need only open a variable to view or edit its data. You can open variables in any order you wish and position and size their windows as you please. * Select some icons and open them. Select the four leftmost variable icons (use Shift-click to select the second, third, and fourth icons or drag a selection rectangle around them all) and choose {Data} Open. When open, the editing windows form a table of data much like a spreadsheet. Each row holds data for a particular individual or case. For these data, each row represents a company. Each column is in its own window, but all columns are linked so that scrolling one window scrolls all of them, and selecting a row in one window selects the case in all windows. Dragging down across a case from top to bottom selects that case in all variables. Click on any data item to edit it. * Close the variable editing windows. Click the close box of each variable window in turn, or choose {Data} Close for each variable in turn. Alternatively, you can close all the variable windows at once by choosing {Data > Close All} Variables. * Reopen the variable Company. Deselect the variables (click anywhere in the window holding variable icons except on a variable) and double-click the Company variable to open it. Having the variable Company open makes the company names available for labeling points in plots. If you Quit and return to this guide later, be sure to open the Company variable again before proceeding. The only windows open now should be the Companies icon window holding the variable icons and the variable editing window holding company names. 7 Navigating in Data Desk The window on your screen now is Data Desk's. Like the operating system, it has windows, icons, and a Trash can in the lower right corner. Most Data Desk operations create new icons. Every plot, table, variable, slide, and action program has its own icon. Data Desk places these new icons in special folders designated for each particular type of object. All these folders are stored inside of the Data Desk File Cabinet, located in the upper right corner of the Data Desk Window. If the File Cabinet is not already open, double-click the File Cabinet icon to open a window holding the special storage folders. Data Desk windows look slightly different from typical operating system windows. In particular, they have two special symbols in their title bars. Click and hold the mouse on the triangle on the left of the title bar to pop up a HyperView menu for the window. HyperView menus hold commands and expert suggestions useful for a particular window. The document symbol to the left of the zoom box, on the right of the menu bar, is an alias for the window's icon. Click on the icon alias to select the window's icon, or drag it anywhere (for example, to the Trash to discard the window). 8 Scatterplots Scatterplots show relationships between pairs of variables that can tell you more about the data than you could learn by examining the variables separately. One of the best ways to start a data analysis is to make a few scatterplots. By convention, the variable whose values are plotted on the vertical axis is denoted y, and the variable whose values are plotted on the horizontal axis is denoted x. Press the Option key to change the cursor to y and select the y-variable icon. Press the Shift key to change the cursor to x and select the x-variable icon. The two icons highlight differently to remind you of their different status. As a shortcut, you can select the y-variable first without pressing Option, and extend the selection with a Shift-click to select the x-variable. If the two icons are adjacent, drag a selection rectangle around them; the leftmost one will be y. * Make a scatterplot of Assets versus Sales. Select the Assets icon as the y-variable. Then hold down the Shift key and click the Sales icon to select it as the x-variable. Choose {Plot} Scatterplots. Data Desk creates a Results folder inside the File Cabinet, places the scatterplot's icon in the Results folder, and opens the scatterplot automatically. Because each plot is in its own window you can see several side-by-side. Data Desk ordinarily draws plots as white points on a black field. This form works well on the computer screen, but is not a good way to print plots. Data Desk inverts plots to black points on a white field for printing. Every Data Desk plot or analysis has its own window and closes into its own icon. Open the Results folder to see the scatterplot's icon. The scatterplot's icon is gray because the scatterplot is open. The Results folder is in the File Cabinet in the upper right corner. The icons of your plots and analyses are kept in the Results folder in the order in which you make them, keeping a record of your analysis. You can always go back to a previous step of your analysis to look at it again. Close the Results folder. The plot shows a few large companies with high assets and sales. These have forced most of the companies into the lower left of the plot, where they clump together, ruining any chance to see patterns among most of the points. You can reposition the points by simply grabbing them with the mouse. Point to the middle of the plot, press the mouse button, and drag the mouse around. (The mouse cursor should look like a hand. If it doesn't, then go on to the next section to learn how to change it.) Data Desk plots aren't just ways for the program to display your data. They also are ways for you to talk to the program about the data, much as you might discuss a plot with a friend. ("I want to recode this point.") There are many ways to work with plots, but some of the most effective use the tools and actions found in the plot modification palettes. 9 Plot Modification Palettes Data Desk provides palettes of tools, symbols, selection modes, color, and slide show navigation tools. The two principal palettes are the Symbols palette, offering a choice of eight plot symbols, and the Plot Tools palette, holding twelve plot tools. If you have a color monitor and have set it for 256 colors, a Colors palette offering a choice of 64 colors joins other palettes. The palettes float above other windows to stay readily at hand. Each has a close box to put it away. * Open the palettes. The {Modify} Palettes command places all the palettes on the desktop. Each tool in the tools palette performs a special function, and each works in every plot for which it makes sense. To pick up a tool, click on it in the palette. To use it, position the mouse over a plot window and press the mouse button. For example, * Pick up the QUERY tool by clicking on it. We can now identify the large company that has ruined the scatterplot. For the query tool to display a company name, the variable Company must be open and in front of any other variable editing windows. The query tool looks like a bomb site when the mouse is over the plot. We zero in on the big company and hold down the mouse button to identify the outlier. We find it's IBM. You should now have three open windows inside the Data Desk Window: one window holding variable icons labeled Companies, the Company variable editing window, and the scatterplot named Ass/Sl Plot. 10 Transform the Data Data rarely come in the best form for use, yet conventional statistics programs do little to help. Data Desk makes it easy to transform numeric values, recode group identifiers, correct errors, or treat extreme values specially. Such manipulations often simplify patterns in the data, let equations fit the data better, and help satisfy common assumptions required by many statistics. For the companies data, the wedge shape of the scatterplot suggests that these variables would be easier to work with in a different form. In fact, economists would typically re-express several of these variables to the log scale, and we should do the same. It is easy to re-express variables in Data Desk. * Select the variables Assets, Sales, and Market Value, and choose {Manip > Transform} Log. Data Desk creates three new icons and places them to the right of the variable icons. These new icons look like variables, except that they have arithmetic symbols on them to indicate that they are computed from the data. In Data Desk, they are called derived variables. * Open the derived variable LSl. Deselect the variables, then double-click the variable LSl to open it. Derived variables do not hold data values, but rather hold the expression from which the data values are computed, as for LSl. You can edit this expression as you would any text. For now, close the window without changing it. Derived variables work like ordinary variables. They are especially handy because they are always consistent with the data. For example, if you change one of the values in the Sales variable, the corresponding value in LSl will change instantly. This consistency is a hallmark of Data Desk operation. Data analysis often leads you to discover and correct errors or try alternative "what if" analyses. Data Desk's consistency makes these steps simple and natural. * Scatterplot LAss versus LSl. Select the icon of LAss first as the y-variable and the icon of LSl second with a Shift-click to make it the x-variable; then choose {Plot} Scatterplots. The new scatterplot opens into its own window near the scatterplot of the raw data. This plot shows a trend of higher assets corresponding to higher sales. It also shows an interesting strip of points along its upper border, stretching in a straight line upward but separated from the body of the plot. You might wonder whether the companies forming this strip have anything in common. The easiest way to tell is to identify them with the query tool. (You already know how. Go ahead. We'll wait for you.) 11 Modify the Plots to See More Every one of the companies in the strip is a bank, which makes sense. Banks should be high in assets and low in sales, although we might not have anticipated such a striking separation from the other companies. The banks bear watching, and we can make them stand out in several ways: by highlighting them, by coloring them, or with a special plot symbol. * Plot the banks with a different symbol. Select the banks by drawing a loop around them with the lasso tool, which calls for a steady hand. Because the bank strip is diagonal, you could not select them with the rectangle selector. If you miss a few, hold the Shift key down (to extend the selection) and lasso the strays. Now click on the x in the Symbols palette. The selected points become x's instantly in both scatterplots. The banks formed a strip in the original scatterplot, but we couldn't easily pick it out until we had transformed the data to make the relationships simpler. Note how different the two plots of these same data look. Choosing the right transformation can help almost any data analysis. If you are working in color, you can also make the banks stand out by coloring them differently. Section 18 shows how. You should now have the following windows open: the Companies window, the Company variable editing window, two scatterplots, and the plot modification palettes. 12 Pie Charts Because the banks stand out so strikingly, you might wonder about other sectors of the business market. The variable sector holds the name of the market sector in which each of the companies does its principal business. sector holds sector names rather than numbers, so you must choose an appropriate display. Pie charts and bar charts show the relative size of each group in a category variable and are appropriate displays for sector. Market sector is ordinarily thought of as a somewhat arbitrary partitioning of the business world into groups, so a pie chart seems most appropriate. * Make a pie chart of sector. Select sector and choose {Plot} Pie Charts. The pie chart opens automatically near the other plots. Like all Data Desk displays, the pie chart works dynamically with other displays to show you more about your data. For example, click on either the name HiTech or on its slice of the pie to select all the companies in the HiTech sector in all the open windows. As you can see, the pie chart provides a convenient way to select (and study) any of the market sectors. * Look at each market sector in turn. Click on each sector in the pie chart and look at the scatterplot on the log scale. Each market sector forms a strip through the plot. This pattern is remarkable; within each market sector there is a straight-line relationship between log Assets and log Sales, and the lines for different market sectors are nearly parallel but at different levels. It is hard to imagine how this pattern could be discovered without the dynamic links between plots. As often happens with patterns in data, the points that do not follow the pattern are interesting in their own right. For example, clicking on the Finance sector shows three companies that differ from the banks. What is different about these three companies? Identifying them with the query tool, we find that there are two general financial services and insurance companies (Dreyfus and Cigna). The third is H&R Block, which behaves more like a member of the retail sector than like a member of the finance sector - appropriate for its storefront business style. 13 HyperView Menus Data Desk's displays and tables provide expert suggestions of related plots and analyses and shortcuts to obtain them. They do so with special HyperView pop-up menus attached to specific parts of tables and displays. When your mouse is over a HyperView menu, the cursor changes to the button hand. Press the mouse button to pop up the menu. The triangle at the top left corner of the pie chart window's title bar is also a HyperView menu button. When the mouse is over that part of the window the cursor changes to the button hand. * Examine the pie chart HyperView menu. Press the mouse over the triangle to pop up the window's HyperView menu. A HyperView menu offers commands related to the window. Here it suggests that a good next step might be a bar chart or a frequency table. This suggestion is an example of how HyperView menus guide you through an analysis. 14 Frequency Tables Frequency tables summarize how a discrete variable is distributed into its categories. They show precisely with numbers what a pie chart displays graphically. * Make a frequency table of sector. Select Frequencies of Sector from the pie chart's HyperView menu. Alternatively, select the variable Sector in the icon window and choose {Calc} Frequency Breakdown. The frequency table has its own HyperView menus attached both to the triangle symbol and to each of the sector names. You may want to examine these HyperView menus now. 5 Recoding Several of the sectors have only a few companies in them. Let's group them in one larger sector. This kind of recoding is a common chore. Data Desk links plots and editing windows, which simplifies recoding and shows you what you are doing at each step. For example, companies selected in the pie chart, frequency table, or scatterplot are selected for editing in all open editing windows as well. This consistency makes recoding simple. * Select the sectors Communication, Medical, Other, and Transportation in the pie chart. Select the sectors graphically. For example, click on the name of one sector in the pie chart, then Shift-click (to extend the selection) on the others in turn. The selected points highlight in the scatterplots and (more important for our purpose here) in the sector editing window. * Open the variable sector and recode the Selected cases. To make the window for the variable sector frontmost without changing the selection, click on its title bar. The {Edit} Replace command replaces the selected cases with the text you specify. Recode the cases into a new Others category: Type Others in the dialog window and click OK. Because you can actually see the cases that you are recoding, you are much less likely to recode the wrong cases by mistake. As soon as you change the data, both the pie chart and the frequency table windows show an exclamation mark in place of the HyperView menu symbol in the title bar. This new symbol indicates that these views of the data are now out-of-date because they no longer correspond to the data values. 16 Updating Windows Data Desk does not merely inform you when a view of the data is out-of-date; it offers to update that view or to provide both the "before" and "after" views of the data. You can set a window to update automatically, but usually it is best to be notified that a window needs updating and to oversee the change. Often you can learn much about changes in the data by watching as they change your results. When a window is out of date, the window's HyperView menu changes to an exclamation mark. Click on the exclamation mark to pop up a menu that offers to update the window in place or to redo it in a new window. * Update the Pie Chart and Frequency windows in place. The ability to modify, correct, or extend your data and then update existing plots and tables is essential for data analysis. It is one example of the principle that in Data Desk each view of your data is another opportunity to pick it up, look at it differently, and learn a bit more about it. We will describe still more ways in the following. * Close the Frequency, Pie Chart, sector, and the raw-data scatterplot windows. You should have only the Companies window, the Company variable window, and the scatterplot of LAss and LSl open. In that scatterplot, the banks should still be plotted with the x plot symbol. 17 Rotating Plots It is easy to get a good idea of the relative size of several groups from a pie chart. Usually you can see simple trends in a scatterplot. But the really important patterns in data often involve three or more variables. Data Desk can display relationships among three, four, or more variables in several intuitive ways and relate these displays to the rest of your data analysis. In the following text we will make a display that shows relationships among five variables. One key to this remarkable power is Data Desk's rotating plot capabilities. + For this section especially, we urge you to try the methods on your computer. It is simply impossible to capture on paper the vivid impression of three-dimensional motion on the screen. * Make a rotating plot of LAss, LSl, and LMV. If we were considering investing in these companies, we would want to understand how their market value (the total value of the company's stock) relates to the other variables. (Market value, too, works best once we take logs.) Select the three derived variables, LAss, LSl, and LMV, in order and choose {Plot} Rotating Plot. At first the plot looks like an ordinary scatterplot except that the axes go through its middle. If you select the variables in the specified order, then LAss is plotted on the y-axis (up-down), LSl on the x-axis (left-right) , and LMV on the z-axis (in-out). Because the z-axis points into the screen (away from you) you can't see it yet. * Rotate the plot. To see all three dimensions, select the rotate tool (the hand with an arrow in it) from the Plot Tools palette, grab the points in the rotating plot by holding down the mouse button, and drag them. The rotate tool controls the plot as if the points were inside a globe mounted on gimbals that let it spin in any direction. When you put your hand on the globe and push, it spins in the direction of your push. The overall structure of this plot shows the general trend of all three variables. Some big companies have high market values, substantial sales, and large assets. Smaller companies (relatively speaking; this is the Forbes 500, after all), tend to be smaller on all three dimensions. Rotating the plot both shows a vivid illusion of three dimensions and makes it easy to orient the point cloud to reveal patterns. For example, you might find a view of the data in which an unusual point sticks out from the rest or a view in which the points cluster together into separate groups. In this plot, the banks (still plotted with x's) do not form the simple strip we saw in the scatterplot, but rather an L shape. If you look carefully, you can see that the banks veering off from the others to form the L are unusual because their market values are low. (The axis lines point toward larger values on each dimension, and the stray bank is thus on the low end of the LMV range.) Unusual cases are always interesting - indeed they often reveal more than the rest of the data. Good data analysis requires that you keep your eye on them so that you can learn why they are special. The plot indicates that the stock of the company at the tip of the L - United Financial Group - may be underpriced. You will learn more about United Financial as the analysis proceeds. * Set the plot rotating. Push the points with the rotate tool and release the mouse button while the mouse is still moving. The points continue to rotate at the speed and in the direction that you push them. To stop the rotation, just grab the plot with the rotate tool by holding down the mouse button, or press the Space bar on your keyboard. You can use plot rotation to find interesting orientations of the data and then record those orientations as new variables for use in other analyses and plots. 18 Color Data Desk displays are designed to help you learn about your data. Rather than using color for accent or decoration, Data Desk displays more information about your data. Colors can differentiate groups, emphasize special cases, or show values of an additional variable. If you have a color monitor, the {Modify} Palettes command places the Colors palette on the desktop along with the other modification palettes. * Color by Group. Select the variable icon sector, bring the scatterplot of LAss versus LSl to the front by clicking on its title bar, and choose {Modify > Colors > Add} By Group. Data Desk assigns colors to the points in the scatterplot according to their market sectors . Choose {Edit} Select All to highlight the points and make their colors easier to see. Select different sectors in the pie chart and observe the points that highlight in the scatterplot. They match the pie chart colors. The colors provide a way to see the market sector strips noted in Section 12 all at once by adding a third variable to the plot. Data Desk keeps colors consistent across most plots. The rotating plot shows the same colors automatically. You may want to rotate it to look for patterns. 19 Extraordinary Cases Although patterns in data are almost always interesting, the most interesting thing about them can be the individuals that fail to follow the pattern. In these data, United Financial Group has a low market value. Possibly, United Financial would have been a good investment (these are 1986 values). The analysis thus far raises a new question: "Why does the market value of United Financial appear to be low?" We will find two parts to the answer. One comes from other variables in the dataset, and another comes from information not in the data. You should have the Companies window, Company variable window, scatterplot, and rotating plot open, along with the modification palettes. Close any other windows that may be open. 20 Construct a New Variable United Financial's stock price might be depressed because it had poor earnings. Let's take a look at profits. A good way to start is to look at their overall distribution. * Make a histogram of Profits. Select Profits and choose {Plot} Histograms . This histogram tells you several things. ¥ There are some extremely high values. ¥ There are some negative values. ¥ The distribution is clumped together too much to be a good discriminator among companies. (Your histogram may show some bars partially filled in. Data Desk shows selection in histograms by highlighting a portion of the bar.) Perhaps you can do better than to use profits alone. Click on the histogram's close box to place it into the Results Folder. Perhaps you could see more by adjusting each company's profits by its sales volume. * Construct a derived variable computing Profits/Sales. Select the variable Profits as y. Shift-click to select the variable Sales as x, and choose {Manip > Transform > Arithmetic} y/x to create the derived variable Prf/Sl. * Color by value. Click on the rotating plot, select Prf/Sl and choose {Modify > Colors > Add} By Ranks. Data Desk colors the points with colors selected from the top half of the Colors palette. The lowest numeric values (least profitable companies) are pale blue and the highest values (most profitable) are red. We divide Profits by Sales to obtain a profitability measure independent of company size. Ranks spread the distribution of Prf/Sl and make more effective use of color. Rotate the plot to see where the biggest money makers and losers are, and identify them with the query tool. The colors show both consistent trends and occasional extraordinary points; both are interesting patterns worthy of further investigation. Note, for example, that United Financial was losing money. The rotating plot now shows five variables at once: LMV, LAss, and LSl as the three spatial dimensions for rotation, Prf/Sl in color, and banks versus others as x symbols. This plot is truly multivariate. You should have the Companies window, Company variable window, scatterplot and rotating plot open, along with the modification palettes. 21 Identify a Group * Make a histogram of Prf/Sl. Profits/Sales has a symmetric bell-shaped distribution with perhaps one extreme value . When you use traditional statistics, variables distributed like this are more likely to satisfy common assumptions, they show patterns more easily, and they often are easier to understand. Let's examine companies that lost money and compare them to the profitable companies. Select the low three bars of the histogram with the knife tool. These are the bars having values less than 0. (All sales values are positive, so negative profits/sales must mean negative profits, that is, losses.) Create a new derived variable with the {Data > New} Derived Variable command. Name it Losers/Winners. Type the expression IF 'Prf/Sl' < 0 THEN "loss" ELSE "profit". Data Desk lets you name categories rather than code them with numbers because names offer a more natural way to work. Data Desk can then use the category names to label results, making them easier to read. 22 Summarize by Groups Let's consider other ways in which profitable and unprofitable companies might differ. * Choose {Calc > Calculation Options} Select Summary Statistics Data Desk displays a choice of summary statistics. Let's include the Median along with the Mean; we have already shown some distributions with extreme values for which the median might be a better location estimate than the mean. * Click the check box next to Median. Doing so includes the median with the summary statistics to be computed. Now Click OK to close the dialog and record your choices. You are still interested in whether profitability accounts for the poor market value of some companies, so * Compute summaries of Market Value. Select the Market Value variable as y and Losers/Winners as x; then choose {Calc > Summaries} Reports By Groups. Data Desk creates a summary report in which each row holds summary statistics for each category in the variable Losers/Winners. 23 Output Tables as Templates We have already shown that Data Desk's plots provide opportunities to specify new displays or analyses. Data Desk output tables do the same. The HyperView menu attached to the variable name Market Value in the Summary output table offers a variety of plots. You can also use this table as a template to compute more summaries similar to these. For example, you might wonder how these groups (profit, loss) differ in terms of sales on average or how sales vary according to the different market sectors. You needn't repeat the actions that generated this table. Instead, you can treat it as a template to obtain what you want. * Drag the variable Sales onto the Market Value label in the summary table. The window acknowledges the drag with highlighted border. When you release the mouse button to drop the icon, the window automatically recomputes and displays a new table summarizing Sales. Surprisingly, the average sales of the unprofitable companies were higher than the average sales of the profitable companies. You can easily see the power and convenience of this object-oriented way of working with analyses. Let's look at the differences between profitable and unprofitable companies by sector. * Make a bar chart of sector. Select the variable sector and choose {Plot} Bar Charts. Use the knife tool and click on the bar above the label Fnn to select the Finance sector. * Compute summaries of Sales by Losers for Finance sector. Click on the line No Selector in the summary reports table and select Use Hot Selector from the HyperView menu. Data Desk automatically recomputes the summary statistics for the selected subset of the data, in this case financial companies. Note that you are still looking at the profitable and unprofitable companies separately. Select another sector (or other sectors) in the bar chart. The summary table will place an exclamation mark in its HyperView menu. Choose Turn On Automatic Update from the HyperView menu of the summary table. Data Desk recalculates summary statistics for the selected subset instantaneously. Such dynamic subset analysis gives you the power to look at any selection of cases that you can define graphically in any Data Desk display or with a logical expression by using derived variables. Object-oriented interaction lets you work with results the way you think about them. A good way to learn about data is to compare different things under similar circumstances. Here, you looked at summaries of Market Value by group and wondered what Sales would show in the same circumstances. The goal was a summary "just like this one, only for Sales." If you said "this one" out loud to a friend, you would point to the summary table. That is just what you do to tell Data Desk what you want. The most direct way to do so is to modify that table rather than to think of a new command or sequence of actions. Data Desk lets you do just that. You can modify any output table or plot by dragging new variables into it or by removing variables from it, restricting analysis to a subset of your data. In this fundamental way Data Desk works the way you do - progressing by basing each new step on the last one. You now have many windows open. The {Data > Close All} Displays command closes displays back into their icons in the Results folder. Open the Results folder. 24 Taking Stock What have you learned so far? First, bigger companies tend to have higher assets, sales, and market values. Second, there is a linear relationship between log Assets and log Sales, with parallel trends for different market sectors. In particular, banks seem to be different from the other companies. (It is interesting to note that, unlike Forbes, Fortune magazine excludes banks from its "500" list.) Third, Profits is an unruly measurement that may be hard to work with but, when standardized as Profits/Sales, it appears to be well-behaved. Also, sales do not appear to be a good determinant of profitability. Finally, one or two anomalous banks seem to have lower market values than would be expected given their assets, sales, and status as banks. One possible explanation is that their profits/sales are lower than those of the other banks. Along the way you computed no statistics beyond simple summaries. You made several kinds of displays, transformed data, constructed new variables, and looked at the data from many different points of view. Throughout all of the analysis, Data Desk has kept every view of the data consistent so that you could easily relate whatever you learned in one window to all of the others. Although we started with no firm questions, we have now formulated several interesting ones. ¥ Why is there a linear relationship between log Assets and log Sales? What do the different levels for different market sectors tell us about how market sectors differ? What would it mean for a company to deviate from the trend for its market sector? ¥ Why are the banks so different from other companies in the sample? Should banks be excluded from analyses of such data (or from economic indicators)? ¥ What is special about United Financial Group that made its market value so low? Most of these questions take us beyond the data at hand. Here again, Data Desk is unconventional. We believe that data analyses should consider a larger framework than the particular data at hand. Often the best result of a data analysis is not just an answer, but rather a better question. When statistical analyses are isolated exercises that rush from data to a test (preferably one significant at p ² .05), they are inherently sterile, for they ignore the fact that analyses should be part of an ongoing effort to learn. 25 Winding Down, Writing Up Most data analyses do not stop with what you learn about your data. Usually you will want to record your conclusions for yourself and for others. Data Desk offers a variety of ways to do this. For written presentations, you can use your favorite word processor, page layout program, or graphics editing program. The {Edit} Copy Window command, available whenever the front window is a plot or table, places on the Clipboard a picture of any plot or a tab-delimited text version of any table. You can paste these directly to another program. Layout windows in Data Desk are an effective place to record the progress of your analysis, to create presentations of your data, and to design figures that combine plots, tables, and text for use in other programs. To make a new layout window select {Data > New} Layout. You can drag in the icons of open windows to place a picture of a display or table in the layout, or drag in a closed icon to place a button in the layout that can locate the original window. Annotations and comments can be added via editable text boxes to produce interactive demonstrations for customers or colleagues. 26 Saving Datafiles The {File} Save Datafile command updates the open file to reflect any changes made since the last save. The Save Datafile AsÉ command saves the current version of the data, including any changes not yet recorded in the datafile, under a new name. The original datafile (under its original name) remains unchanged.